Age Dependent Document Priors in Link Structure Analysis
نویسندگان
چکیده
Much research has been performed investigating how links between web pages can be exploited in an Information Retrieval setting [1, 4]. In this poster, we investigate the application of the Barabási-Albert model to link structure analysis on a collection of web documents within the language modeling framework. Our model utilizes the web structure as described by a Scale Free Network and derives a document prior based on a web document’s age and linkage. Preliminary experiments indicate the utility of our approach over other current link structure algorithms and warrants further research.
منابع مشابه
Language Models for Searching in Web Corpora
We describe our participation in the TREC 2004 Web and Terabyte tracks. For the web track, we employ mixture language models based on document full-text, incoming anchortext, and documents titles, with a range of webcentric priors. We provide a detailed analysis of the effect on relevance of document length, URL structure, and link topology. The resulting web-centric priors are applied to three...
متن کاملCombining Structural Information and the Use of Priors in Mixed Named-Page and Homepage Finding
This paper presents Carnegie Mellon University’s experiments on the mixed named-page and homepage finding task of the TREC 12 Web Track. Our results were strong; we achieved the success using language models estimated from combining information from document text, in-link text, and information present in the structure of the documents. We also present experiments using expectations about poster...
متن کاملLanguage Model Document Priors based on Citation and Co-citation Analysis
Citation, an integral component of research papers, implies certain kind of relevance that is not well captured in current Information Retrieval (IR) researches. In this paper, we explore ingesting citation and co-citation analysis results into IR modeling process. We operationalize on going beyond the general uniform document prior assumption in language modeling framework through deriving doc...
متن کاملPersian Printed Document Analysis and Page Segmentation
This paper presents, a hybrid method, low-resolution and high-resolution, for Persian page segmentation. In the low-resolution page segmentation, a pyramidal image structure is constructed for multiscale analysis and segments document image to a set of regions. By high-resolution page segmentation, by connected components analysis, each region is segmented to homogeneous regions and identifyi...
متن کاملInvestigating the Relationship between Population Structure and Poverty
Introduction: Poverty reduction is one of the important macroeconomic goals of any country, but achieving this important issue requires examining the factors affecting it. Changing the age structure of the population is one of the effective factors in reducing poverty in countries. Therefore, governments can make the most of their population, given the capacity of countries and providing the ne...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005